Feature Selection using Misclassification Counts
نویسندگان
چکیده
Dimensionality reduction of the problem space through detection and removal of variables, contributing little or not at all to classification, is able to relieve the computational load and instance acquisition effort, considering all the data attributes accessed each time around. The approach to feature selection in this paper is based on the concept of coherent accumulation of data about class centers with respect to coordinates of informative features. Ranking is done on the degree to which different variables exhibit random characteristics. The results are being verified using the Nearest Neighbor classifier. This also helps to address the feature irrelevance and redundancy, what ranking does not immediately decide. Additionally, feature ranking methods from different independent sources are called in for the direct comparison.
منابع مشابه
Credit Card Fraud Detection using Data mining and Statistical Methods
Due to today’s advancement in technology and businesses, fraud detection has become a critical component of financial transactions. Considering vast amounts of data in large datasets, it becomes more difficult to detect fraud transactions manually. In this research, we propose a combined method using both data mining and statistical tasks, utilizing feature selection, resampling and cost-...
متن کاملProbabilistic Token Selection via Fisher’s Method in Text Classification
In this project we consider a multiclass text classification problem on three newsgroups with 1,000 entries each with a feature class consisting of over 50,000 tokens. Our baseline Naive Bayes method gives a misclassification error rate of 4.51%, and we focus on variable selection methods to improve upon this error. We compare a token selection method using Naive Bayes to one using the related ...
متن کاملThe CASH algorithm-cost-sensitive attribute selection using histograms
Feature selection is an essential process for machine learning tasks since it improves generalization capabilities, and reduces run-time and amodel’s complexity. Inmany applications, the cost of collecting the features must be taken into account. To cope with the cost problem, we developed a new cost-sensitive fitness function based on histogram comparison. This function is integrated with a ge...
متن کاملSecond Order Cone Programming Formulations for Feature Selection
This paper addresses the issue of feature selection for linear classifiers given the moments of the class conditional densities. The problem is posed as finding a minimal set of features such that the resulting classifier has a low misclassification error. Using a bound on the misclassification error involving the mean and covariance of class conditional densities and minimizing an L1 norm as a...
متن کاملFeature Selection Using Classifier in High Dimensional Data
Feature selection is frequently used as a pre-processing step to machine learning. It is a process of choosing a subset of original features so that the feature space is optimally reduced according to a certain evaluation criterion. The central objective of this paper is to reduce the dimension of the data by finding a small set of important features which can give good classification performan...
متن کامل